The aim of these exercises is to continue making sure you’re comfortable with handling multivariate data. In this chapter’s, you’ll focus on analyses based on dissimilarities/distances, including fitting linear models to these kinds of response variables.
For each of the examples below, you should follow the sequence we’ve used previously, as far as it’s sensible:
What is the biological question?
Is the predictor continuous or categorical?
Write out the linear model corresponding to this question.
What distribution do you expect the response variable to follow?
What are the assumptions behind the statistical model you’ll fit?
Fit the model
How will you assess whether the model fits well?
Can you detect an effect of the predictors?
How do you measure the effect?
What do you conclude (including any cautions)
Dixon et al. (2018) focused on assemblages of reptiles in forests and woodlands of southeastern Australia, with a particular interest in relationships between these assemblages and the fire history of the landscape. They identified 81 sites that varied in time since fire, from 6 months to >96 y. Rather than a continuum, three categories of time since fire were used, 0.5-2y, 6-12y, and >96y. Sites were also classified according to habitat.
Reptile assemblages were sampled with a range of methods, including visual surveys and camera traps, to give counts of 20 reptiles, whose abundance ranged from 0-1 to 0-126, depending on species.
Data are available from dryad, as Rep_abund.csv. You’ll want to focus on the columns with reptile numbers, and the one with the fire history (tsf). For convenience, we’ve extracted those data as dixonbiota.csv
Start by having a quick look at the data.
df <- read_csv("../data/dixonbiota.csv")
Rows: 79 Columns: 22── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (2): site, tsf
dbl (20): adup, aplat, amur, amac, aram, dcor, ecun, esax, eul, htal, ldel, lgui, lwhi, ppor...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df, 10)
The file is pretty straightforward, with tsf being the fire category, and columns 3-22 each representing one reptile taxon.
Repeat the preceding analysis after creating a new data file that is presence-absence only.
df1.jac <- vegdist(df1,binary=TRUE,method='jaccard')
#Run 3d first
df1s.mds3 <- metaMDS(df1.jac,k=3,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.1404128
Run 1 stress 0.1478658
Run 2 stress 0.1416978
Run 3 stress 0.1405588
... Procrustes: rmse 0.01500187 max resid 0.08886188
Run 4 stress 0.1404109
... New best solution
... Procrustes: rmse 0.002860922 max resid 0.02265594
Run 5 stress 0.145486
Run 6 stress 0.1476233
Run 7 stress 0.1418068
Run 8 stress 0.1423797
Run 9 stress 0.1457509
Run 10 stress 0.1416979
Run 11 stress 0.1404126
... Procrustes: rmse 0.002853227 max resid 0.02281661
Run 12 stress 0.140558
... Procrustes: rmse 0.01456468 max resid 0.08805689
Run 13 stress 0.1414712
Run 14 stress 0.1411415
Run 15 stress 0.1405617
... Procrustes: rmse 0.0149327 max resid 0.0899859
Run 16 stress 0.1415481
Run 17 stress 0.1404251
... Procrustes: rmse 0.00228483 max resid 0.01842417
Run 18 stress 0.1428212
Run 19 stress 0.1405586
... Procrustes: rmse 0.01462785 max resid 0.08844371
Run 20 stress 0.1404136
... Procrustes: rmse 0.003032611 max resid 0.02427334
Run 21 stress 0.1405605
... Procrustes: rmse 0.01485076 max resid 0.08965517
Run 22 stress 0.1416429
Run 23 stress 0.1499082
Run 24 stress 0.1420545
Run 25 stress 0.1419425
Run 26 stress 0.1405616
... Procrustes: rmse 0.01492593 max resid 0.08996847
Run 27 stress 0.1405633
... Procrustes: rmse 0.0149849 max resid 0.09014188
Run 28 stress 0.1404112
... Procrustes: rmse 0.0001823383 max resid 0.0008661697
... Similar to previous best
Run 29 stress 0.1418758
Run 30 stress 0.1485354
Run 31 stress 0.1419033
Run 32 stress 0.1458155
Run 33 stress 0.1404104
... New best solution
... Procrustes: rmse 0.0002170324 max resid 0.001584222
... Similar to previous best
Run 34 stress 0.1404102
... New best solution
... Procrustes: rmse 0.0002560659 max resid 0.001758468
... Similar to previous best
Run 35 stress 0.1404098
... New best solution
... Procrustes: rmse 0.001533377 max resid 0.01214581
Run 36 stress 0.1414897
Run 37 stress 0.1404105
... Procrustes: rmse 0.0003876278 max resid 0.003003357
... Similar to previous best
Run 38 stress 0.1443789
Run 39 stress 0.1404132
... Procrustes: rmse 0.002460885 max resid 0.01976426
Run 40 stress 0.1488112
*** Best solution repeated 1 times
stressplot(df1s.mds3, main="Shepard plot")
df1s.mds3
Call:
metaMDS(comm = df1.jac, k = 3, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.jac
Distance: binary jaccard
Dimensions: 3
Stress: 0.1404098
Stress type 1, weak ties
Best solution was repeated 1 time in 40 tries
The best solution was from try 35 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
# Now look at 2d
df1s.mds2 <- metaMDS(df1.jac,k=2,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.2009509
Run 1 stress 0.2057315
Run 2 stress 0.2115578
Run 3 stress 0.2019149
Run 4 stress 0.2042377
Run 5 stress 0.2295765
Run 6 stress 0.2137388
Run 7 stress 0.2200574
Run 8 stress 0.2095372
Run 9 stress 0.2181162
Run 10 stress 0.2120083
Run 11 stress 0.2100096
Run 12 stress 0.2094365
Run 13 stress 0.2059035
Run 14 stress 0.2068707
Run 15 stress 0.2138507
Run 16 stress 0.2247804
Run 17 stress 0.2205491
Run 18 stress 0.2031129
Run 19 stress 0.2066478
Run 20 stress 0.2133806
Run 21 stress 0.2148553
Run 22 stress 0.2136219
Run 23 stress 0.2228445
Run 24 stress 0.2027397
Run 25 stress 0.214167
Run 26 stress 0.2118895
Run 27 stress 0.2011779
... Procrustes: rmse 0.005744405 max resid 0.04013641
Run 28 stress 0.2011764
... Procrustes: rmse 0.005803269 max resid 0.03983499
Run 29 stress 0.2031113
Run 30 stress 0.2172545
Run 31 stress 0.2109358
Run 32 stress 0.2081906
Run 33 stress 0.2095477
Run 34 stress 0.2156386
Run 35 stress 0.2184839
Run 36 stress 0.2011783
... Procrustes: rmse 0.006120401 max resid 0.03870025
Run 37 stress 0.2133788
Run 38 stress 0.2175712
Run 39 stress 0.2171102
Run 40 stress 0.2155669
Run 41 stress 0.209539
Run 42 stress 0.2014972
Run 43 stress 0.2135397
Run 44 stress 0.2148463
Run 45 stress 0.2145609
Run 46 stress 0.2042383
Run 47 stress 0.2122397
Run 48 stress 0.2221256
Run 49 stress 0.2075648
Run 50 stress 0.21128
Run 51 stress 0.2035834
Run 52 stress 0.2009593
... Procrustes: rmse 0.004919144 max resid 0.03284558
Run 53 stress 0.2262974
Run 54 stress 0.2096971
Run 55 stress 0.2233517
Run 56 stress 0.2040341
Run 57 stress 0.2081925
Run 58 stress 0.2031114
Run 59 stress 0.2030044
Run 60 stress 0.2035828
Run 61 stress 0.2045719
Run 62 stress 0.2135396
Run 63 stress 0.2209364
Run 64 stress 0.2156286
Run 65 stress 0.2015097
Run 66 stress 0.2168916
Run 67 stress 0.2036082
Run 68 stress 0.209978
Run 69 stress 0.202577
Run 70 stress 0.2074581
Run 71 stress 0.2119757
Run 72 stress 0.2038332
Run 73 stress 0.2021148
Run 74 stress 0.2150588
Run 75 stress 0.21096
Run 76 stress 0.2260586
Run 77 stress 0.2030607
Run 78 stress 0.2131804
Run 79 stress 0.2133794
Run 80 stress 0.218075
*** Best solution was not repeated -- monoMDS stopping criteria:
13: no. of iterations >= maxit
67: stress ratio > sratmax
stressplot(df1s.mds2, main="Shepard plot")
df1s.mds2
Call:
metaMDS(comm = df1.jac, k = 2, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.jac
Distance: binary jaccard
Dimensions: 2
Stress: 0.2009509
Stress type 1, weak ties
Best solution was not repeated after 80 tries
The best solution was from try 0 (metric scaling or null solution)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
Marginal decision here - stress just over 0.2 for k=2, so might be OK. Stress good for k=3, but 2-d MDS usually better for reporting or showing to audience
a<-as.data.frame(scores(df1s.mds3))
a<-cbind(df[c(1:2)],a) #Add site names & symbols from original data file
p1<-ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=tsf, ) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=sym3,
name="TSF",
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=a, aes(x=NMDS1, y=NMDS3, color = tsf, ) )+
geom_point()+
stat_ellipse()+
labs(y="MDS3", x="MDS1")+
scale_shape_manual(values=sym3,
name="TSF",
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p1 + p2 + plot_layout(guides='collect') & theme_qk() & scale_color_uchicago()
Unburned sites separate very clearly on MDS1. Pattern clearer than with abundances included
Examine dispersions
df.disp <- betadisper(df1.jac,df$tsf)
anova(df.disp)
Analysis of Variance Table
Response: Distances
Df Sum Sq Mean Sq F value Pr(>F)
Groups 2 0.18892 0.094460 6.0078 0.003781 **
Residuals 76 1.19495 0.015723
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Still some difference, though not as pronounced when only pres-abs used
Do permanova including tsf
df1.ado <- adonis2(df1.jac~tsf,data=df,permutations=999)
print(df1.ado)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = df1.jac ~ tsf, data = df, permutations = 999)
Df SumOfSqs R2 F Pr(>F)
tsf 2 3.7853 0.20678 9.9057 0.001 ***
Residual 76 14.5210 0.79322
Total 78 18.3062 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Clear fire signal
Dixon and her colleagues also recorded a range of habitat variables, which we’ve collected for you in dixonenv.csv
dixonenv <- read_csv("../data/dixonenv.csv")
Rows: 79 Columns: 14── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (4): site, tsf, veg, aspect
dbl (10): lcwd, shrcov, grcov, litcov, litdep, rocks, elev, warm, cold, twi
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dixonenv, 10)
They recorded Coarse Woody Debris, which was long-transformed, litter cover (litcov), % cover of groundcover (grcov), and % cover of shrubs (shrcov), and rock cover. In the original paper, Dixon et al. were comfortable that these predictors weren’t correlated.
dfmv <- mvabund(df1)
df.mv <- manyglm(dfmv~dixonenv$tsf+dixonenv$lcwd+dixonenv$litcov+dixonenv$grcov+dixonenv$shrcov+dixonenv$lrocks,family="poisson")
plot(df.mv)
# still hetero vars so use -ve binomial
df.mv1 <- manyglm(dfmv~dixonenv$tsf+dixonenv$lcwd+dixonenv$litcov+dixonenv$grcov+dixonenv$shrcov+dixonenv$lrocks,family="negative_binomial")
plot(df.mv1)
anova(df.mv1)
Time elapsed: 0 hr 1 min 37 sec
Analysis of Deviance Table
Model: dfmv ~ dixonenv$tsf + dixonenv$lcwd + dixonenv$litcov + dixonenv$grcov + dixonenv$shrcov + dixonenv$lrocks
Multivariate test:
Res.Df Df.diff Dev Pr(>Dev)
(Intercept) 78
dixonenv$tsf 76 2 331.6 0.001 ***
dixonenv$lcwd 75 1 61.5 0.001 ***
dixonenv$litcov 74 1 25.9 0.287
dixonenv$grcov 73 1 36.4 0.073 .
dixonenv$shrcov 72 1 21.5 0.601
dixonenv$lrocks 71 1 58.0 0.003 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Arguments:
Test statistics calculated assuming uncorrelated response (for faster computation)
P-value calculated using 999 iterations via PIT-trap resampling.
Let’s return to the Hutto and Barrett (2021) example from the Chapter 15 exercises. There, we used similarity-based analyses to explore the relationship between frog assemblages in ponds and the degree of urbanization surrounding. This seems a question that could just as easily be examined using distances or dissimilarities.
Return to the data and assess whether frog assemblages (using the standardized abundance scale or presence-absence) differ with urbanization.
df <- read_csv("../data/huttoamph.csv")
Rows: 50 Columns: 14── Column specification ────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): type
dbl (13): Site, ACRE, BAME, BFOW, GCAR, HCIN, HVER, LCAT, LCLA, LPAL, LSPH, PCRU, PFER
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(df,10)
#ponds 9 and 40 had no frogs. Remove for distance analysis
df <- df %>%
filter(Site!= 9 & Site !=40)
df
The explanation of the variables is in the previous chapter’s exercises.
df1 <- df[,-(1:2)] #Stripped back file without labels, leaving only
df1.bc <- vegdist(df1,'bray')
#Run 2d first
df1.mds2 <- metaMDS(df1.bc,k=2,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.196708
Run 1 stress 0.1874607
... New best solution
... Procrustes: rmse 0.06607221 max resid 0.2294443
Run 2 stress 0.1949763
Run 3 stress 0.1964022
Run 4 stress 0.1990471
Run 5 stress 0.1916105
Run 6 stress 0.2037482
Run 7 stress 0.1961229
Run 8 stress 0.1888276
Run 9 stress 0.1977393
Run 10 stress 0.2035879
Run 11 stress 0.2060001
Run 12 stress 0.1862237
... New best solution
... Procrustes: rmse 0.1042587 max resid 0.4865567
Run 13 stress 0.1888252
Run 14 stress 0.1920872
Run 15 stress 0.2075707
Run 16 stress 0.1854506
... New best solution
... Procrustes: rmse 0.06781964 max resid 0.2195375
Run 17 stress 0.1974324
Run 18 stress 0.1853453
... New best solution
... Procrustes: rmse 0.006310912 max resid 0.03189332
Run 19 stress 0.203107
Run 20 stress 0.1888252
Run 21 stress 0.1875614
Run 22 stress 0.1862238
Run 23 stress 0.1973006
Run 24 stress 0.1853452
... New best solution
... Procrustes: rmse 0.0001046814 max resid 0.0004231746
... Similar to previous best
Run 25 stress 0.2077143
Run 26 stress 0.1967075
Run 27 stress 0.1862781
Run 28 stress 0.1933023
Run 29 stress 0.1877629
Run 30 stress 0.1862237
Run 31 stress 0.2165344
Run 32 stress 0.1862239
Run 33 stress 0.1929084
Run 34 stress 0.196071
Run 35 stress 0.1967906
Run 36 stress 0.1990753
Run 37 stress 0.1862236
Run 38 stress 0.1888204
Run 39 stress 0.1877626
Run 40 stress 0.1920872
*** Best solution repeated 1 times
stressplot(df1.mds2, main="Shepard plot")
df1.mds2
Call:
metaMDS(comm = df1.bc, k = 2, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.bc
Distance: bray
Dimensions: 2
Stress: 0.1853452
Stress type 1, weak ties
Best solution was repeated 1 time in 40 tries
The best solution was from try 24 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
# Now look at 3d
df.mds3 <- metaMDS(df1.bc,k=3,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.129187
Run 1 stress 0.1384284
Run 2 stress 0.1384287
Run 3 stress 0.1289617
... New best solution
... Procrustes: rmse 0.01116102 max resid 0.05648397
Run 4 stress 0.128962
... Procrustes: rmse 0.001080951 max resid 0.006138037
... Similar to previous best
Run 5 stress 0.1380872
Run 6 stress 0.1291861
... Procrustes: rmse 0.01054685 max resid 0.05449119
Run 7 stress 0.1291862
... Procrustes: rmse 0.01108717 max resid 0.05660768
Run 8 stress 0.1289618
... Procrustes: rmse 0.0009977382 max resid 0.005680432
... Similar to previous best
Run 9 stress 0.1380886
Run 10 stress 0.1291601
... Procrustes: rmse 0.009846495 max resid 0.05162542
Run 11 stress 0.1289617
... Procrustes: rmse 0.0001180296 max resid 0.000397725
... Similar to previous best
Run 12 stress 0.1388068
Run 13 stress 0.1289614
... New best solution
... Procrustes: rmse 0.0002094611 max resid 0.001134175
... Similar to previous best
Run 14 stress 0.1291858
... Procrustes: rmse 0.01061726 max resid 0.05490047
Run 15 stress 0.138595
Run 16 stress 0.1384304
Run 17 stress 0.1291863
... Procrustes: rmse 0.01062772 max resid 0.0549278
Run 18 stress 0.1291602
... Procrustes: rmse 0.009566248 max resid 0.05100092
Run 19 stress 0.1291864
... Procrustes: rmse 0.01111886 max resid 0.05672889
Run 20 stress 0.1383909
Run 21 stress 0.1380887
Run 22 stress 0.1384274
Run 23 stress 0.1384276
Run 24 stress 0.128962
... Procrustes: rmse 0.0002702949 max resid 0.00160678
... Similar to previous best
Run 25 stress 0.1291856
... Procrustes: rmse 0.01060586 max resid 0.05479499
Run 26 stress 0.1289617
... Procrustes: rmse 0.0002416892 max resid 0.001126536
... Similar to previous best
Run 27 stress 0.1383939
Run 28 stress 0.1291857
... Procrustes: rmse 0.01062561 max resid 0.05487515
Run 29 stress 0.1380879
Run 30 stress 0.1291609
... Procrustes: rmse 0.01000971 max resid 0.05217349
Run 31 stress 0.1291859
... Procrustes: rmse 0.01092252 max resid 0.05627383
Run 32 stress 0.1291858
... Procrustes: rmse 0.01090207 max resid 0.05607242
Run 33 stress 0.1291861
... Procrustes: rmse 0.01051196 max resid 0.05451072
Run 34 stress 0.1289618
... Procrustes: rmse 0.0002363175 max resid 0.001405223
... Similar to previous best
Run 35 stress 0.1291857
... Procrustes: rmse 0.01088096 max resid 0.05598853
Run 36 stress 0.1289613
... New best solution
... Procrustes: rmse 0.0001167837 max resid 0.0004408721
... Similar to previous best
Run 37 stress 0.1314089
Run 38 stress 0.1384272
Run 39 stress 0.1289616
... Procrustes: rmse 0.0002186302 max resid 0.001234739
... Similar to previous best
Run 40 stress 0.1291857
... Procrustes: rmse 0.01067024 max resid 0.0551978
*** Best solution repeated 2 times
stressplot(df.mds3, main="Shepard plot")
df.mds3
Call:
metaMDS(comm = df1.bc, k = 3, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.bc
Distance: bray
Dimensions: 3
Stress: 0.1289613
Stress type 1, weak ties
Best solution was repeated 2 times in 40 tries
The best solution was from try 36 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
2-d is acceptable
a<-as.data.frame(scores(df1.mds2))
a<-cbind(df[c(1:2)],a) #Add site names & symbols from original data file
ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=type, ) )+
geom_point()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=sym3,
name="Urbanization",
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)+
theme_qk() + scale_color_uchicago()
df1.ado <- adonis2(df1.bc~type,data=df,permutations=999)
print(df1.ado)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = df1.bc ~ type, data = df, permutations = 999)
Df SumOfSqs R2 F Pr(>F)
type 2 0.912 0.07281 1.7669 0.043 *
Residual 45 11.614 0.92719
Total 47 12.526 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Permanova detects a difference; MDS plot suggests low urbanization may be different from the other two, though worth looking at dispersion as well, as spread of this type is less than the other two on plot
Repeat analysis with presence-absence
df1.jac <- vegdist(df1,binary=TRUE,method=‘jaccard’)
#Run 2d first
df1.jac <- vegdist(df1,binary=TRUE,method='jaccard')
df1.mds2 <- metaMDS(df1.jac,k=2,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.1551612
Run 1 stress 0.1837013
Run 2 stress 0.1696919
Run 3 stress 0.1880348
Run 4 stress 0.1635149
Run 5 stress 0.1630923
Run 6 stress 0.1585799
Run 7 stress 0.1536788
... New best solution
... Procrustes: rmse 0.01900199 max resid 0.06508032
Run 8 stress 0.1934746
Run 9 stress 0.1538245
... Procrustes: rmse 0.01100139 max resid 0.05222997
Run 10 stress 0.1609337
Run 11 stress 0.1622269
Run 12 stress 0.1704331
Run 13 stress 0.1602261
Run 14 stress 0.1538244
... Procrustes: rmse 0.01099654 max resid 0.05221544
Run 15 stress 0.1696919
Run 16 stress 0.1621113
Run 17 stress 0.1630125
Run 18 stress 0.1585799
Run 19 stress 0.166129
Run 20 stress 0.1641746
Run 21 stress 0.1586729
Run 22 stress 0.1887895
Run 23 stress 0.1585798
Run 24 stress 0.1606139
Run 25 stress 0.163515
Run 26 stress 0.1739632
Run 27 stress 0.1709241
Run 28 stress 0.1772585
Run 29 stress 0.1850913
Run 30 stress 0.1630924
Run 31 stress 0.1721432
Run 32 stress 0.1608313
Run 33 stress 0.1808947
Run 34 stress 0.1602261
Run 35 stress 0.1866374
Run 36 stress 0.1555137
Run 37 stress 0.1691747
Run 38 stress 0.1952929
Run 39 stress 0.1538807
... Procrustes: rmse 0.008809693 max resid 0.04417994
Run 40 stress 0.1806084
Run 41 stress 0.1991962
Run 42 stress 0.2043037
Run 43 stress 0.1646708
Run 44 stress 0.1608262
Run 45 stress 0.1634767
Run 46 stress 0.1586728
Run 47 stress 0.1585799
Run 48 stress 0.1627099
Run 49 stress 0.1565718
Run 50 stress 0.1748722
Run 51 stress 0.1627089
Run 52 stress 0.1622264
Run 53 stress 0.1758648
Run 54 stress 0.1586729
Run 55 stress 0.1724591
Run 56 stress 0.1777698
Run 57 stress 0.1606135
Run 58 stress 0.1835377
Run 59 stress 0.1591044
Run 60 stress 0.1538245
... Procrustes: rmse 0.01100665 max resid 0.0522056
Run 61 stress 0.1591019
Run 62 stress 0.1732141
Run 63 stress 0.1641745
Run 64 stress 0.1704183
Run 65 stress 0.1535586
... New best solution
... Procrustes: rmse 0.0107227 max resid 0.05414397
Run 66 stress 0.1695828
Run 67 stress 0.1586729
Run 68 stress 0.1609028
Run 69 stress 0.1609341
Run 70 stress 0.1781204
Run 71 stress 0.1551611
Run 72 stress 0.1585798
Run 73 stress 0.1627113
Run 74 stress 0.1930708
Run 75 stress 0.1721433
Run 76 stress 0.1673092
Run 77 stress 0.1538804
... Procrustes: rmse 0.01706038 max resid 0.08568356
Run 78 stress 0.1585799
Run 79 stress 0.1535586
... Procrustes: rmse 0.0002906816 max resid 0.001349569
... Similar to previous best
*** Best solution repeated 1 times
stressplot(df1.mds2, main="Shepard plot")
df1.mds2
Call:
metaMDS(comm = df1.jac, k = 2, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.jac
Distance: binary jaccard
Dimensions: 2
Stress: 0.1535586
Stress type 1, weak ties
Best solution was repeated 1 time in 79 tries
The best solution was from try 65 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
# Now look at 3d
df.mds3 <- metaMDS(df1.jac,k=3,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.09466813
Run 1 stress 0.09466822
... Procrustes: rmse 0.0005907259 max resid 0.002917344
... Similar to previous best
Run 2 stress 0.09466795
... New best solution
... Procrustes: rmse 0.000422125 max resid 0.001876703
... Similar to previous best
Run 3 stress 0.1097137
Run 4 stress 0.1090623
Run 5 stress 0.09466803
... Procrustes: rmse 7.232359e-05 max resid 0.0003468283
... Similar to previous best
Run 6 stress 0.1005051
Run 7 stress 0.09466808
... Procrustes: rmse 0.0001823469 max resid 0.000917663
... Similar to previous best
Run 8 stress 0.09693885
Run 9 stress 0.09466795
... Procrustes: rmse 0.0001688021 max resid 0.0006306265
... Similar to previous best
Run 10 stress 0.09466813
... Procrustes: rmse 0.0002394501 max resid 0.001007719
... Similar to previous best
Run 11 stress 0.09466792
... New best solution
... Procrustes: rmse 0.000156739 max resid 0.0005673598
... Similar to previous best
Run 12 stress 0.100505
Run 13 stress 0.09466824
... Procrustes: rmse 0.0001553529 max resid 0.0006098347
... Similar to previous best
Run 14 stress 0.09466817
... Procrustes: rmse 0.0001279451 max resid 0.0004343751
... Similar to previous best
Run 15 stress 0.1005054
Run 16 stress 0.09466821
... Procrustes: rmse 0.0001518064 max resid 0.0006174398
... Similar to previous best
Run 17 stress 0.09466807
... Procrustes: rmse 0.0001705385 max resid 0.00053697
... Similar to previous best
Run 18 stress 0.10049
Run 19 stress 0.1004898
Run 20 stress 0.09466785
... New best solution
... Procrustes: rmse 0.0003178359 max resid 0.001519645
... Similar to previous best
Run 21 stress 0.09693835
Run 22 stress 0.09466791
... Procrustes: rmse 0.0003074555 max resid 0.001502487
... Similar to previous best
Run 23 stress 0.1090588
Run 24 stress 0.1005046
Run 25 stress 0.09466806
... Procrustes: rmse 0.0001456666 max resid 0.0008120412
... Similar to previous best
Run 26 stress 0.09466794
... Procrustes: rmse 0.000286181 max resid 0.001024811
... Similar to previous best
Run 27 stress 0.1005051
Run 28 stress 0.09466829
... Procrustes: rmse 0.0004617523 max resid 0.002151037
... Similar to previous best
Run 29 stress 0.09466836
... Procrustes: rmse 0.0002740026 max resid 0.001533354
... Similar to previous best
Run 30 stress 0.09466781
... New best solution
... Procrustes: rmse 0.0001601026 max resid 0.0007993518
... Similar to previous best
Run 31 stress 0.09693822
Run 32 stress 0.09693828
Run 33 stress 0.09466811
... Procrustes: rmse 0.0003134413 max resid 0.00135763
... Similar to previous best
Run 34 stress 0.09466819
... Procrustes: rmse 0.0003169054 max resid 0.001196281
... Similar to previous best
Run 35 stress 0.1090587
Run 36 stress 0.1115488
Run 37 stress 0.1005051
Run 38 stress 0.10049
Run 39 stress 0.09466797
... Procrustes: rmse 0.000226344 max resid 0.0005714952
... Similar to previous best
Run 40 stress 0.09466804
... Procrustes: rmse 0.0002594792 max resid 0.00106972
... Similar to previous best
*** Best solution repeated 5 times
stressplot(df.mds3, main="Shepard plot")
df.mds3
Call:
metaMDS(comm = df1.jac, k = 3, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1.jac
Distance: binary jaccard
Dimensions: 3
Stress: 0.09466781
Stress type 1, weak ties
Best solution was repeated 5 times in 40 tries
The best solution was from try 30 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
Again, stress OK with 2D, good with 3D. Look at 2D first
a<-as.data.frame(scores(df1.mds2))
a<-cbind(df[c(1:2)],a) #Add site names & symbols from original data file
ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=type, ) )+
geom_point()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=sym3,
name="Urbanization",
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)+
theme_qk() + scale_color_uchicago()
df1.ado <- adonis2(df1.jac~type,data=df,permutations=999)
print(df1.ado)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = df1.jac ~ type, data = df, permutations = 999)
Df SumOfSqs R2 F Pr(>F)
type 2 0.4525 0.04389 1.0328 0.416
Residual 45 9.8587 0.95611
Total 47 10.3112 1.00000
No separation seen on Presence-Absence
Griffen et al. (2019) examined the oral microbiome of humans, with a focus on the effects of HIV and AntiRetroviral therapy (ART) on this microbiome. They described the microbiome of 341 patients, who fell into one of two categories: HIV-, and HIV+ with ART. They also matched their samples as far as possible for sex, along with other conditions that can influence oral microbiomes (Candida infection, current smoking). We’ll use their records for these other categories as well. There were more HIV+ than - subjects, but within these groups, approximately equal numbers of two sexes, two Candida infection status, and two smoking categories.
The bacteriome was recorded separately using 16S RNA sequencing, which overed just over 600 taxa.
The data are available from their paper, as two Excel files in the supplementary information. The metadata gives HIV status, and a raft of demographic information. We’ll just work with HIV, sex, Candida, and current smoking. A second sheet has the bacteriome. The code chunk below reads this file (from a Downloads folder - you’ll need to modify the file location). It also screens the file for any bacterial taxa that were not present in this particular study, which reduces the bacterial taxa to 599.
library(readxl)
#Read in metadata. Lots of information we don't need, so a couple of iterations to get down to a few columns we want - HIV status, candida, current smoking, gender
df <- read_excel("../data/41598_2019_55703_MOESM3_ESM.xlsx",
range = "A1:S342", na = "NA")
df<-select(df, HIV, candida, gender, smoking_current)
head(df)
#df1 is the bacterial data
df1 <- read_excel("../data/41598_2019_55703_MOESM2_ESM.xlsx")
New names:
head(df1,10)
df1 <- df1 %>%
select(where( ~ is.numeric(.x) && sum(.x) != 0)) #Drops any bacterial taxon not recorded in this study (i.e. where it's all zeroes)
df1 <- df1[,-1] #remove col1, which is sample ID
Do these factors act independently on the bacteriome?
Which effects are largest (use some graphical methods)?
First steps: Think about standardization and which distance measure you’ll use. Bray-Curtis seems common for these kind of data, and and counts of different taxa vary widely.
df1s <- wisconsin(df1) #Common to do some standardisation, and counts of different taxa vary widely
df1s.bc <- vegdist(df1s,'bray') #With standardisation (wisconsin)
Run factorial model using permanova, based on B-C
df1.ado <- adonis2(df1s.bc~HIV*candida*gender*smoking_current,data=df,permutations=999)
print(df1.ado)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = df1s.bc ~ HIV * candida * gender * smoking_current, data = df, permutations = 999)
Df SumOfSqs R2 F Pr(>F)
HIV 1 1.369 0.01250 4.3776 0.001 ***
candida 1 1.461 0.01333 4.6694 0.001 ***
gender 1 0.492 0.00449 1.5737 0.013 *
smoking_current 1 1.111 0.01014 3.5513 0.001 ***
HIV:candida 1 0.318 0.00290 1.0151 0.386
HIV:gender 1 0.354 0.00324 1.1332 0.209
candida:gender 1 0.257 0.00234 0.8202 0.843
HIV:smoking_current 1 0.269 0.00245 0.8584 0.769
candida:smoking_current 1 0.344 0.00314 1.0989 0.238
gender:smoking_current 1 0.394 0.00360 1.2602 0.104
HIV:candida:gender 1 0.349 0.00319 1.1164 0.234
HIV:candida:smoking_current 1 0.280 0.00255 0.8943 0.687
HIV:gender:smoking_current 1 0.265 0.00242 0.8463 0.793
candida:gender:smoking_current 1 0.254 0.00232 0.8110 0.857
HIV:candida:gender:smoking_current 1 0.362 0.00330 1.1562 0.190
Residual 325 101.664 0.92809
Total 340 109.541 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model shows main effects, but no interactions important (or at least, detected)
Could you fit a simpler model to the data?
Run simpler model
df2.ado <- adonis2(df1s.bc~HIV+candida+gender+smoking_current,data=df,permutations=999)
print(df2.ado)
Permutation test for adonis under reduced model
Terms added sequentially (first to last)
Permutation: free
Number of permutations: 999
adonis2(formula = df1s.bc ~ HIV + candida + gender + smoking_current, data = df, permutations = 999)
Df SumOfSqs R2 F Pr(>F)
HIV 1 1.369 0.01250 4.3775 0.001 ***
candida 1 1.461 0.01333 4.6693 0.001 ***
gender 1 0.492 0.00449 1.5736 0.026 *
smoking_current 1 1.111 0.01014 3.5512 0.001 ***
Residual 336 105.108 0.95953
Total 340 109.541 1.00000
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
No change; not too surprising, as all four main effects were detected easily.
Now go to data visualization
#Run 3d first
df1s.mds3 <- metaMDS(df1s.bc,k=3,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.1573956
Run 1 stress 0.1630191
Run 2 stress 0.1576127
... Procrustes: rmse 0.0118972 max resid 0.1519299
Run 3 stress 0.1581654
Run 4 stress 0.1576599
... Procrustes: rmse 0.009852819 max resid 0.1527284
Run 5 stress 0.1574955
... Procrustes: rmse 0.012444 max resid 0.1517661
Run 6 stress 0.1587469
Run 7 stress 0.1575933
... Procrustes: rmse 0.01163693 max resid 0.1521229
Run 8 stress 0.1579649
Run 9 stress 0.1578527
... Procrustes: rmse 0.006323981 max resid 0.09982249
Run 10 stress 0.1574957
... Procrustes: rmse 0.01120014 max resid 0.1523462
Run 11 stress 0.1577498
... Procrustes: rmse 0.009888467 max resid 0.1527233
Run 12 stress 0.157409
... Procrustes: rmse 0.007115967 max resid 0.09726598
Run 13 stress 0.1573745
... New best solution
... Procrustes: rmse 0.01073383 max resid 0.1524203
Run 14 stress 0.1573637
... New best solution
... Procrustes: rmse 0.009102594 max resid 0.1079765
Run 15 stress 0.1578146
... Procrustes: rmse 0.01085179 max resid 0.153061
Run 16 stress 0.1579197
Run 17 stress 0.1579066
Run 18 stress 0.1575399
... Procrustes: rmse 0.01237825 max resid 0.1521774
Run 19 stress 0.1575654
... Procrustes: rmse 0.00849738 max resid 0.1079133
Run 20 stress 0.1577392
... Procrustes: rmse 0.01138932 max resid 0.1522989
Run 21 stress 0.1577813
... Procrustes: rmse 0.01136102 max resid 0.1528569
Run 22 stress 0.159205
Run 23 stress 0.1579391
Run 24 stress 0.1591617
Run 25 stress 0.1577727
... Procrustes: rmse 0.008387684 max resid 0.0967774
Run 26 stress 0.1579649
Run 27 stress 0.1582177
Run 28 stress 0.1576498
... Procrustes: rmse 0.01137476 max resid 0.1525317
Run 29 stress 0.1591301
Run 30 stress 0.157277
... New best solution
... Procrustes: rmse 0.008559541 max resid 0.1081297
Run 31 stress 0.1579428
Run 32 stress 0.1576009
... Procrustes: rmse 0.01273404 max resid 0.1520734
Run 33 stress 0.1576118
... Procrustes: rmse 0.01247834 max resid 0.1523452
Run 34 stress 0.1582027
Run 35 stress 0.158822
Run 36 stress 0.158023
Run 37 stress 0.1575531
... Procrustes: rmse 0.007344253 max resid 0.1083239
Run 38 stress 0.157274
... New best solution
... Procrustes: rmse 0.0003543994 max resid 0.003824715
... Similar to previous best
Run 39 stress 0.1578054
Run 40 stress 0.1578922
*** Best solution repeated 1 times
stressplot(df1s.mds3, main="Shepard plot")
df1s.mds3
Call:
metaMDS(comm = df1s.bc, k = 3, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1s.bc
Distance: bray
Dimensions: 3
Stress: 0.157274
Stress type 1, weak ties
Best solution was repeated 1 time in 40 tries
The best solution was from try 38 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
# Now look at 2d
df1s.mds2 <- metaMDS(df1s.bc,k=2,autotransform=FALSE,try=40,trymax=80,maxit=200)
Run 0 stress 0.2249552
Run 1 stress 0.2251211
... Procrustes: rmse 0.009373779 max resid 0.1481459
Run 2 stress 0.2255671
Run 3 stress 0.2261504
Run 4 stress 0.2254572
Run 5 stress 0.2257456
Run 6 stress 0.2265326
Run 7 stress 0.2270347
Run 8 stress 0.2258506
Run 9 stress 0.2258621
Run 10 stress 0.225423
... Procrustes: rmse 0.01541529 max resid 0.2310901
Run 11 stress 0.2248945
... New best solution
... Procrustes: rmse 0.00452352 max resid 0.08220424
Run 12 stress 0.2257927
Run 13 stress 0.2249545
... Procrustes: rmse 0.004482882 max resid 0.0822275
Run 14 stress 0.2253883
... Procrustes: rmse 0.01627316 max resid 0.2347665
Run 15 stress 0.224955
... Procrustes: rmse 0.004520226 max resid 0.0822353
Run 16 stress 0.2250835
... Procrustes: rmse 0.008578052 max resid 0.1524037
Run 17 stress 0.2275692
Run 18 stress 0.2254386
Run 19 stress 0.2254065
Run 20 stress 0.2253846
... Procrustes: rmse 0.01579237 max resid 0.2345835
Run 21 stress 0.2248481
... New best solution
... Procrustes: rmse 0.006292327 max resid 0.08260263
Run 22 stress 0.2264731
Run 23 stress 0.2250843
... Procrustes: rmse 0.01060527 max resid 0.1508133
Run 24 stress 0.2251181
... Procrustes: rmse 0.008233582 max resid 0.1483655
Run 25 stress 0.2255503
Run 26 stress 0.2261628
Run 27 stress 0.2251126
... Procrustes: rmse 0.008253967 max resid 0.1490181
Run 28 stress 0.2265968
Run 29 stress 0.2248932
... Procrustes: rmse 0.006309882 max resid 0.08268836
Run 30 stress 0.2253832
Run 31 stress 0.2277665
Run 32 stress 0.2258463
Run 33 stress 0.2253349
... Procrustes: rmse 0.009301725 max resid 0.1493075
Run 34 stress 0.2256928
Run 35 stress 0.2254409
Run 36 stress 0.2248496
... Procrustes: rmse 0.0002922941 max resid 0.005119062
... Similar to previous best
Run 37 stress 0.2248936
... Procrustes: rmse 0.006317961 max resid 0.08246939
Run 38 stress 0.2260217
Run 39 stress 0.2273473
Run 40 stress 0.2255819
*** Best solution repeated 1 times
stressplot(df1s.mds2, main="Shepard plot")
df1s.mds2
Call:
metaMDS(comm = df1s.bc, k = 2, try = 40, trymax = 80, autotransform = FALSE, maxit = 200)
global Multidimensional Scaling using monoMDS
Data: df1s.bc
Distance: bray
Dimensions: 2
Stress: 0.2248481
Stress type 1, weak ties
Best solution was repeated 1 time in 40 tries
The best solution was from try 21 (random start)
Scaling: centring, PC rotation, halfchange scaling
Species: scores missing
3D model fits better than 2D. Stress for 2D above 0.2
Start with plots where symbol colour indicates levels of one of the factors
a<-as.data.frame(scores(df1s.mds3))
a<-cbind(df[c(1:4)],a) #Add site names & symbols from original data file
p1<-ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=HIV) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=a, aes(x=NMDS1, y=NMDS3, color = HIV) )+
geom_point()+
stat_ellipse()+
labs(y="MDS3", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p1 + p2 + plot_layout(guides='collect') + plot_annotation(title="HIV status") & theme_qk() & scale_color_uchicago()
p1<-ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=candida) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=a, aes(x=NMDS1, y=NMDS3, color = candida) )+
geom_point()+
stat_ellipse()+
labs(y="MDS3", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p1 + p2 + plot_layout(guides='collect') + plot_annotation(title="Candida") & theme_qk() & scale_color_uchicago()
p1<-ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=gender) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=a, aes(x=NMDS1, y=NMDS3, color = gender) )+
geom_point()+
stat_ellipse()+
labs(y="MDS3", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p1 + p2 + plot_layout(guides='collect') + plot_annotation(title="Sex") & theme_qk() & scale_color_uchicago()
p1<-ggplot(data=a, aes(x=NMDS1, y=NMDS2, color=smoking_current) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=a, aes(x=NMDS1, y=NMDS3, color = smoking_current) )+
geom_point()+
stat_ellipse()+
labs(y="MDS3", x="MDS1")+
scale_shape_manual(values=c(16,17),
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p1 + p2 + plot_layout(guides='collect') + plot_annotation(title="Current smoking") & theme_qk() & scale_color_uchicago()
Try some visualizations of pairs of factors using MDS1 & MDS2 (while 3D is preferred, stress of 3D is around .15, vs 0.22, so not huge difference) Separate HIV+/-, then use symbol colour to separate second factor.
p1<-ggplot(data=subset(a, candida=="Yes"), aes(x=NMDS1, y=NMDS2, color=HIV) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title="Candida")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=subset(a, candida =="No"), aes(x=NMDS1, y=NMDS2, color = HIV) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title = "No Candida")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p_combined <- p1 + p2 + plot_layout(guides='collect') & theme_qk() & scale_color_uchicago()
p_ranges_x <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_x[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_x[[1]]$range$range)
p_ranges_y <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_y[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_y[[1]]$range$range)
p_combined &
xlim(min(p_ranges_x), max(p_ranges_x)) &
ylim(min(p_ranges_y), max(p_ranges_y))
p1<-ggplot(data=subset(a, HIV=="Yes"), aes(x=NMDS1, y=NMDS2, color=candida) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title="HIV+")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=subset(a, HIV =="No"), aes(x=NMDS1, y=NMDS2, color = candida) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title = "HIV-")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p_combined <- p1 + p2 + plot_layout(guides='collect') & theme_qk() & scale_color_uchicago()
p_ranges_x <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_x[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_x[[1]]$range$range)
p_ranges_y <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_y[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_y[[1]]$range$range)
p_combined &
xlim(min(p_ranges_x), max(p_ranges_x)) &
ylim(min(p_ranges_y), max(p_ranges_y))
p1<-ggplot(data=subset(a, HIV=="Yes"), aes(x=NMDS1, y=NMDS2, color = smoking_current) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title="HIV+")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p2<-ggplot(data=subset(a, HIV =="No"), aes(x=NMDS1, y=NMDS2, color = smoking_current) )+
geom_point()+
stat_ellipse()+
labs(y="MDS2", x="MDS1",
title = "HIV-")+
scale_shape_manual(
guide =
guide_legend(label.theme = element_text(size=6),
title=NULL)
)
p_combined <- p1 + p2 + plot_layout(guides='collect') & theme_qk() & scale_color_uchicago()
p_ranges_x <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_x[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_x[[1]]$range$range)
p_ranges_y <- c(ggplot_build(p_combined[[1]])$layout$panel_scales_y[[1]]$range$range,
ggplot_build(p_combined[[2]])$layout$panel_scales_y[[1]]$range$range)
p_combined &
xlim(min(p_ranges_x), max(p_ranges_x)) &
ylim(min(p_ranges_y), max(p_ranges_y))